POV-Ray : Newsgroups : povray.programming : Why not generate parser with Bison & Flex? : Re: Why not generate parser with Bison & Flex? Server Time
29 Jul 2024 02:26:08 EDT (-0400)
  Re: Why not generate parser with Bison & Flex?  
From: Evan Powers
Date: 31 Dec 1998 18:13:51
Message: <368c0618.27900056@news.povray.org>
I've moved some of the quotes around, by the way.

On Wed, 30 Dec 1998 19:27:20 +0100, "Thorsten Froehlich"
<fro### [at] charliecnsiitedu> wrote:

>Note that these tools are designed for defined languages, not constantly extended
>languages like the POV-Ray scene description language, but before I continue, please
>note that I never used Bison and Flex myself, but I have seen code generated by them.

Sorry, you are mistaken. If you read up on the Bison grammar
description language (http://www.gnu.org/manual/bison/index.html) you
will realize that it is designed with a dynamic language definition in
mind. Adding new language features and changing existing ones is
simplistic.

As an example, here is a grammar file (from the Bison manual, with
minor modifications) specifying how an expression can be built from
subexpressions and operators. (Bison generates a function that parses
expressions, evaluates them, and prints the result from this file,
assuming there is an acompanying lexer.)
	/* Infix notation four-function calculator */
	%{
		#define YYSTYPE double
		#include <math.h>
	%}

	%token NUM	/* a number */
	%left '-' '+'
	%left '*' '/'

	%%

	exp:	NUM		{ $$ = $1; }
		| exp '+' exp	{ $$ = $1 + $3; }
		| exp '-' exp	{ $$ = $1 - $3; }
		| exp '*' exp	{ $$ = $1 * $3; }
		| exp '/' exp	{ $$ = $1 / $3; }
		| exp '\n'	{ printf ("%.10g\n", $1); };
	<EOF>
	
Note that this grammar file fully defines the operator precedence and
associativity.

Say I want to extend the expression parser to include some additional
operators:
	/* Infix notation calculator, more robust */
	%{
		#define YYSTYPE double
		#include <math.h>
	%}

	%token NUM	/* a number */
	%left '-' '+'
	%left '*' '/'
	%left NEG	/* negation--unary minus */
	%right '^'	/* exponentiation        */

	%%

	exp:	NUM			{ $$ = $1; }
		| exp '+' exp		{ $$ = $1 + $3; }
		| exp '-' exp		{ $$ = $1 - $3; }
		| exp '*' exp		{ $$ = $1 * $3; }
		| exp '/' exp		{ $$ = $1 / $3; }
		| '-' exp  %prec NEG	{ $$ = -$2; }
		| exp '^' exp		{ $$ = pow ($1, $3); }
		| '(' exp ')'		{ $$ = $2; };
		| exp '\n'		{ printf ("%.10g\n", $1); };
	<EOF>

I concede that this is not the best example; however, it tells enough
about the Bison grammar language to allow the extrapolation of how
modifications to the POV grammar could be made.

>And the parser is only a (minor) part, you still the "compiler" part for the language
>- and this goes for POV-Ray as well, see below.

>Well, making a grammar file does not write the code 'converting' the parser output
>into POV-Ray's internal data structures, and as this the part the parser code you see
>as difficult to understand does very well and even transparent to some extend.

It is important to realize that, as defined by Bison, only the code
within the POV parser specifing what language element should come next
is part of the parser; the rest of the code, the part that builds the
internal data structures, is *not*. If the POV parser were written
with Bison, this code would be a Bison action (such as "{$$ = $1 +
$3}" above) instead of being embedded within the parsing logic. Thus,
a Bison POV parser would be just as stand-alone as PARSE.C.

I grabbed the following code snippet out of PARSE.C from the POV
source archive. I have marked the lines that are actually part of the
parser with a '%'.
%	CASE (SCALE_TOKEN)
%		Parse_Scale_Vector (Local_Vector);
		Compute_Scaling_Transform(&Local_Trans, Local_Vector);
		for (Current=First; Current!=NULL;
Current=Current->Sibling)
		{
			Scale_Object (Current, Local_Vector,
&Local_Trans);
		}
%	END_CASE

Just as making a grammar file does not write the code building POV's
internal data structures, writing the following using the existing
parser macros doesn't either:
	CASE (SCALE_TOKEN)
		Parse_Scale_Vector (Local_Vector);
	END_CASE

>Why are these macros funny or cryptic - once you got used to them they really ease
>reading the parser code (I think)?

Bearing my above statements in mind, I believe the argument is about
which grammar description language is superior. I, for one, belive a
language separating the grammar definition and the interpretation code
is superior to one in which the two are intermixed. Furthermore, I
belive that defining a language according to its overall structure,
then the overall structure of its components, then the overall
structure of the components' components, and so on, makes more
intuitive sense and is easier to read and maintain than a definition
of a language based upon what can follow each particular fundamental
element.

Which of these makes more sense? (Which is more concise?)
A) Bison-style.
	sentance:
		subject action predecate
	subject:
		[noun-modifiers] noun
	action:
		[verb-modifiers] verb
	predecate:
		[direct-object]
		| indirect-object direct-object
		| direct-object [indirect-object]
B) Current POV parser style.
	sentance:
		[noun-modifiers] noun-modifiers-followers
	noun-modifiers-followers:
		noun noun-followers
	noun-followers:
		[verb-modifiers] verb-modifiers-followers
	verb-modifiers-followers:
		verb verb-followers
	verb-followers:
		[direct-object]
		| indirect-object io-followers
		| direct-object do-followers
	io-followers:
		direct-object
	do-followers:
		[indirect-object]

>However, it is true that the POV-Ray parser is not the absolute optimal design, but
>it is more flexible and significantly easier to extend than Bison generated code (if
>you want to do so by hand...).

You don't extend the Bison generated code; you extend the original
grammar definition file. (I will grant you that one would not want to
manualy generate a parser from a Bison grammar definition file. But
then, that's what Bison is for!) Look again at the above grammar
definition methods. Which do you think is easier to extend? If you're
unsure, try modifying each definition to state that both a direct
object and an indirect object can consist of optional noun modifiers
followed by a noun.

>Yes, Bison and Flex are (always) available on Unix, there are a lot of different
>implementations (and version) on different platforms, and if someone on one platform
>wants to extend the language and uses a different version of Bison and Flex ther will
>be different code each time someone else generates the code!

GNU Bison generates the same code regardless of the host system type,
and is the de facto standard for parser generators. For the sake of
argument, however, I will assume that POV developers use several
versions of Bison.

With this assumption, your statement is true--but then, it doesn't
really matter. If someone wanted to extend the language, they would
modify the grammar definition file, *not* the generated parser. In
fact, I see no conceivable reason for anyone to even *look* at the
generated parser. (Not even someone debugging a Bison parser looks at
the generated parser code.) The file destributed in the POV source
archive would be the Bison grammar file, not the generated parser
file.

As for the fact that different versions generate different code:
All Bison-style parser generators use the same fundamental algorithm.
Even Yacc, Bison's ancestor, uses the same algorithm. The only
significant differences in the generated code are those caused by
whether the generated code is optimized for speed or size. I contend
that these differences are inconsequential, since no one actually
looks at the generated code anyway and since different command line
options can generate comparable code with different Bison versions.
Furthermore, the POV development team is unconcerned about the fact
that Borland, Watcom, Microsoft, and GNU compilers generate different
code from the same platform independent source file, and that, for
example, a LIBPNG compiled by Borland C cannot be used with a version
of POV compiled by GNU C. (I contend that there is no fundamental
difference between these two situations, and therefore are
comparable.)

>This would be very confusing and not work (easy) for a cross platform application
>*and* development like POV-Ray.

This is not true. A given Bison grammar file is platform independent,
since Bison generates platform independent ANSI C, provided the
user-supplied C code within its actions is platform independent, just
as a parser using the POV macros is platform independent provided the
code generating the internal data structures is platform independent.

>Yes, the code they output is fast, but (very well) handwritten, speciallized parsers
>will most likely be much faster even languages like C++, it is just that handwritten

Knowing what I do about the Bison algorithm, I very much doubt that a
handwritten parser written in the style of PARSE.C could in any way
compare in terms of speed. Read the chapters in the Bison manual about
the Bison algorithm to see what I mean. (The URL is above.)

>parsers for such languages are not as common because these languages are "closed"
>(now), while the POV-Ray scene language is "open" (and will always be).  While a

(Incidentally, I believe the C and C++ language standards are just as
"open" as the POV-Ray scene language. $18 per electronic copy of the
C++ language standard--price from www.nssn.org--doesn't sound like the
price of a closed standard to me. Besides, that explains why AT&T
isn't the only company who can legally sell a C or C++ compiler.)

>parser for example for C++ will only have to be generated once, and the compiler code

Haven't you heard of templates, exception handling, and RTTI (Run Time
Type Identification)? All are *very* recent additions to the C++
language standard.

>itself will do the rest, for POV-ray this will result in: Run Flex and Bison,
>integrate it with the POV-Ray code, check if everything still works, add the code
>supporting the extension to the language is the data setup code, recompile and hope
>there will not be the need for a last minute extension like the "material" syntax in

Granted, the initial coversion of PARSE.C into a Bison grammar file
will be considerable work. However, it only has to be done once, and
is more easily modified to include additional language elements than
the current macro-based system.

>3.1...for the current, handwritten parser you learn to use (or print them out) the
>few(!!!) macros and make an extension like the "material" statement with just a few
>lines of code to change and *no* further work.

I fail to understand how the work involved in modifying a Bison
grammar description file is greater than that involved in modifying
PARSE.C.

Besides, your position is like arguing that AT&T should never have
abandoned Lparse in favor of a direct C++ to assembler implementation
because Lparse was perfectly usable, even though a direct
implementation had significant advantages. (Yes, C++ was originally
implemented with a C preprocessor.)

>Well, I think Thomas Baier (in the team now) has done some of this work some time
>ago.

I would be interested in knowing more about this....

--Evan Powers
EPT### [at] aolcom


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.